The code below simply cleans your environment to avoid loading unnecessary functions or variables and loads the libraries used in our script. We begin by installing and loading the required packages. For BDA, we use mainly Bürkner (2020), whereas Gabry and Mahr (2020) provides support with various plots and functions to calculate credible intervals.
rm( list = ls() ) # Cleans the environment.
# You can install packages in case they are not installed already.
# In case you don't have rstan installed, see:
# https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
# install.packages(c("tidyverse", "skimr", "ggthemes", "patchwork", "ggdag",
# "dagitty", "brms", "loo", "bayesplot"))
library(tidyverse) # For transforming and visualizing data.
library(skimr) # For getting data summaries
library(ggthemes) # Themes for ggplot
ggplot2::theme_set(theme_tufte())
library(patchwork) # Combining many plots into the same figure.
library(ggdag) # For DAG analysis
library(dagitty) # For DAG analysis
library(brms) # BDA packages. Alternatively, one can use rethinking & rstanarm.
library(loo) # For comparing different models' performance
library(bayesplot) # Plotting BDA output by Gabry et al.
bayesplot::color_scheme_set("viridis") #Uses the viridis palette on bayesplots
For reproducibility and efficiency we set an arbitrary seed, sampling parameters and the number of cores to speed up the MCMC sampling.
SAMPLES = 5000
WARMUP = 1000
CHAINS = 4
SEED = 2020
DELTA = 0.99
TREE = 13
set.seed(SEED)
options(mc.cores = parallel::detectCores())
First we load the data and take a look at it. As this report is analyzing the performance differences between Bugspots and Linespots, we focus only on those variables relevant to runtime.
d = read_delim(
'../data/full_evaluation.csv',
delim = ",",
locale = locale(decimal_mark = "."),
col_names = TRUE,
col_types = cols(
AUCEC1 = col_double(),
AUCEC100 = col_double(),
AUCEC20 = col_double(),
AUCEC5 = col_double(),
AUROC = col_double(),
Algorithm = col_factor(),
Depth = col_double(),
EInspect10 = col_double(),
EInspect100 = col_double(),
EInspect200 = col_double(),
EInspect50 = col_double(),
EInspectF = col_double(),
EXAM = col_double(),
FixCount = col_double(),
Future = col_double(),
LOC = col_double(),
Origin = col_double(),
Project = col_factor(),
Runtime = col_double(),
Time = col_factor(),
Weight = col_factor(),
comit_version = col_factor(),
commits = col_double(),
language = col_factor(),
url = col_factor()
)
)
# There seem to be some cases where no faults were found in the pseudo future.
# As those cases can't tell us anything, we will remove them.
d = subset(d, d$FixCount != 0)
d$EInspectFLong = ceiling(d$EInspectF * d$LOC)
# For clarity, we only keep the variables we later use.
d = data.frame("Algorithm" = d$Algorithm, "LOC" = d$LOC, "FixCount" = d$FixCount,
"Project" = d$Project, "language" = d$language, "AUCEC1" = d$AUCEC1,
"AUCEC5" = d$AUCEC5, "AUROC" = d$AUROC, "EInspect10" = d$EInspect10,
"EInspect100" = d$EInspect100, "EInspectF" = d$EInspectFLong, "EXAM" = d$EXAM)
| Name | d |
| Number of rows | 480 |
| Number of columns | 12 |
| _______________________ | |
| Column type frequency: | |
| factor | 3 |
| numeric | 9 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| Algorithm | 0 | 1 | FALSE | 2 | Lin: 240, Bug: 240 |
| Project | 0 | 1 | FALSE | 32 | mys: 18, woo: 18, pre: 18, ser: 18 |
| language | 0 | 1 | FALSE | 8 | Jav: 120, Jav: 72, Typ: 66, C: 54 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| LOC | 0 | 1 | 1678978.99 | 3027324.25 | 21182.00 | 269496.00 | 624173.00 | 1565072.00 | 20705587.00 | ▇▁▁▁▁ |
| FixCount | 0 | 1 | 91.80 | 72.35 | 7.00 | 40.75 | 69.00 | 115.00 | 418.00 | ▇▃▁▁▁ |
| AUCEC1 | 0 | 1 | 0.08 | 0.06 | 0.00 | 0.04 | 0.07 | 0.11 | 0.35 | ▇▆▂▁▁ |
| AUCEC5 | 0 | 1 | 0.20 | 0.10 | 0.00 | 0.12 | 0.19 | 0.26 | 0.56 | ▅▇▅▂▁ |
| AUROC | 0 | 1 | 0.02 | 0.03 | 0.00 | 0.00 | 0.01 | 0.02 | 0.16 | ▇▁▁▁▁ |
| EInspect10 | 0 | 1 | 0.10 | 0.34 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 | ▇▁▁▁▁ |
| EInspect100 | 0 | 1 | 0.67 | 1.30 | 0.00 | 0.00 | 0.00 | 1.00 | 8.00 | ▇▁▁▁▁ |
| EInspectF | 0 | 1 | 2993.67 | 10293.40 | 1.00 | 63.00 | 235.00 | 1160.00 | 112581.00 | ▇▁▁▁▁ |
| EXAM | 0 | 1 | 0.22 | 0.06 | 0.08 | 0.18 | 0.22 | 0.26 | 0.38 | ▂▅▇▃▁ |
With both LOC and FixCount spanning multiple orders of magnitude, we standardize them to improve sampling performance.
d$LOC = scale(d$LOC)
d$FixCount = scale(d$FixCount)
Based on the the data we gathered, we built a DAG, representing the causal relationships as we assume them. In this case, the analysis is rather simple. We designed the experiment in such a way, that there is no incoming causal relationship to algorithm so we could use all parameters without confounding problems.
Runtime_dag <- dagify(
Project ~ Language,
LOC ~ Project,
FixCount ~ Project,
EvaluationMetrics ~ Algorithm + Project + LOC + FixCount + Language,
exposure = "Algorithm",
outcome = "EvaluationMetrics",
labels = c(
"Project" = "Project",
"Language" = "Language",
"LOC" = "LOC",
"FixCount" = "Fix\nCount",
"Algorithm" = "Algorithm",
"EvaluationMetrics" = "Evaluation\nMetrics"
)
)
ggdag(Runtime_dag, text = FALSE, use_labels = "label", layout="circle") +
theme_dag()
***
The graph shows that there is only a single possible causal path from ‘Algorithm’ to ‘Evaluation Metrics’, so regardless of which other predictor we add to a model, they will not add bias or confounding.
ggdag_paths(Runtime_dag,
text = FALSE,
use_labels = "label",
shadow = TRUE,adjust_for = c("LOC", "Project", "Language", "FixCount"),
layout="circle") +
theme_dag()
Finally, we can test the three sets of predictors we plan on using for being adjustment sets.
isAdjustmentSet(Runtime_dag, c("LOC"))
isAdjustmentSet(Runtime_dag, c("LOC", "FixCount"))
isAdjustmentSet(Runtime_dag, c("LOC", "FixCount", "Project"))
isAdjustmentSet(Runtime_dag, c("LOC", "FixCount", "Project", "Language"))
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
We are interested in the evaluation metrics as our outcome, Algorithm as our
exposure and we control for LOC, Project as well as language.
We then build generalized linear models (GLM) for different predictor combinations
and compare them using psis-loo.
For the intercept priors we used insights from past research, either our own or from others in the field. The remaining priors were chosen in such a way that the pp_check graphs show models allowing values outside of the data range to prevent overfitting.
For the area under the ROC curve we use a beta likelihood, as it represents the ratio of the union square that is under the ROC curve.
While we do not have past experience with AUROC values for these algorithms, we know from our past work that even in the early parts of the result lists the precision and recall are very low for both. To represent this we set the intercept prior to 0.1 or roughly -2 on the logit scale.
As we presume most of the values to be small and somewhat concentrated, we expect phi to be a little higher. We put a wide prior on phi as we are not certain where exactly it will lie.
\[ \begin{split} \mathcal{M}_3: \mathrm{AUROC_i} \thicksim &\ \mathrm{Beta}(\mu_i, \phi) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F\mathrm{FixCount}_i + \alpha_{Project[i]} \\ \alpha \thicksim &\ \mathrm{Normal(-2, 1)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.25)} \\ \alpha_p \thicksim &\ \mathrm{Weibull}(2, 1)\ \ \ \mathrm{for\ }p = 1..32 \\ \mathrm{log}(\phi) \thicksim &\ \mathrm{Normal(50, 20)} \end{split} \]
mauroc3p = brm(
formula = AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
data = d,
family = Beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mauroc3p, nsamples = 100) + scale_y_continuous(trans='sqrt')
mauroc3= brm(
formula = AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
data = d,
family = Beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
The comparison shows equal loo performance for M3 and M4. As M3 is the simpler model, we choose it as our candidate model.
Results:
summary(mauroc3)
## Family: beta
## Links: mu = logit; phi = identity
## Formula: AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.64 0.10 0.47 0.86 1.00 3517 6331
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -4.09 0.13 -4.35 -3.82 1.00 2953 4717
## AlgorithmBugspots -0.35 0.08 -0.51 -0.19 1.00 16001 12322
## LOC -0.15 0.09 -0.33 0.04 1.00 7034 10077
## FixCount 0.14 0.05 0.05 0.23 1.00 12482 11552
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 40.27 3.37 33.91 47.10 1.00 13338 11445
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The posterior predictive check shows a good fit.
pp_check(mauroc3, nsamples = 100) + scale_y_continuous(trans='sqrt')
On the logit scale the effect of Bugspots is well negative with no 0 overlap at all.
On the outcome scale, there is some overlap between the two algorithms, but the
respective means are outside of the 95% intervals shown and there is a clear
tendency for Linespots to have higher AUROC values than Bugspots
mcmc_areas(mauroc3, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("AUROC Posterior Distribution\n With 95% intervals")
eff = conditional_effects(mauroc3, effects = c("Algorithm"))
eff$Algorithm
eff
And Diagnostics: All diagnostics look well.
min(neff_ratio(mauroc3))
## [1] 0.1839649
max(rhat(mauroc3))
## [1] 1.000528
plot(mauroc3)
mauroc1p = brm(
formula = AUROC ~ 1 + Algorithm + LOC,
data = d,
family = Beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mauroc1p, nsamples = 100) + scale_y_continuous(trans='sqrt')
mauroc1= brm(
formula = AUROC ~ 1 + Algorithm + LOC,
data = d,
family = Beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mauroc1, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(mauroc1)
## Family: beta
## Links: mu = logit; phi = identity
## Formula: AUROC ~ 1 + Algorithm + LOC
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -3.95 0.08 -4.10 -3.80 1.00 8852 8956
## AlgorithmBugspots -0.35 0.08 -0.51 -0.18 1.00 10228 10003
## LOC -0.22 0.05 -0.33 -0.12 1.00 9609 9029
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 25.69 2.21 21.51 30.26 1.00 7420 8287
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mauroc2p = brm(
formula = AUROC ~ 1 + Algorithm + LOC + FixCount,
data = d,
family = Beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mauroc2p, nsamples = 100) + scale_y_continuous(trans='sqrt')
mauroc2= brm(
formula = AUROC ~ 1 + Algorithm + LOC + FixCount,
data = d,
family = Beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mauroc2, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(mauroc2)
## Family: beta
## Links: mu = logit; phi = identity
## Formula: AUROC ~ 1 + Algorithm + LOC + FixCount
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -3.98 0.08 -4.14 -3.83 1.00 9972 10655
## AlgorithmBugspots -0.30 0.08 -0.46 -0.14 1.00 12215 10808
## LOC -0.23 0.06 -0.35 -0.13 1.00 11026 10413
## FixCount 0.15 0.04 0.07 0.23 1.00 12624 10608
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 26.38 2.26 22.18 31.00 1.00 9021 9805
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mauroc4p = brm(
formula = AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
data = d,
family = Beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mauroc4p, nsamples = 100) + scale_y_continuous(trans='sqrt')
mauroc4= brm(
formula = AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
data = d,
family = Beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
summary(mauroc4)
## Family: beta
## Links: mu = logit; phi = identity
## Formula: AUROC ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~language (Number of levels: 8)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.54 0.23 0.18 1.08 1.00 4388 5167
##
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.52 0.10 0.36 0.75 1.00 5059 8323
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -4.10 0.25 -4.55 -3.57 1.00 5965 6573
## AlgorithmBugspots -0.36 0.08 -0.52 -0.20 1.00 20536 12478
## LOC -0.11 0.09 -0.29 0.07 1.00 12468 11924
## FixCount 0.14 0.05 0.05 0.23 1.00 16167 12004
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 40.44 3.40 34.06 47.41 1.00 15995 10420
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
pp_check(mauroc4, nsamples = 100) + scale_y_continuous(trans='sqrt')
Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.
loo_mauroc1 = loo(mauroc1)
loo_mauroc2 = loo(mauroc2)
loo_mauroc3 = loo(mauroc3)
loo_mauroc4 = loo(mauroc4)
loo_compare(loo_mauroc1, loo_mauroc2, loo_mauroc3, loo_mauroc4)
## elpd_diff se_diff
## mauroc4 0.0 0.0
## mauroc3 -0.4 1.1
## mauroc2 -65.6 11.3
## mauroc1 -70.6 11.5
For the EXAM score we again use a beta likelihood, as it represents the ratio of LOC one has to inspect to find a fault, averaged across all faults.
In our past work the mean EXAM value was around 0.226 or around -1 on the logit scale.
As we presume most of the values to be small and somewhat concentrated, we expect phi to be a little higher. We put a wide prior on phi as we are not certain where exactly it will lie.
\[ \begin{split} \mathcal{M}_3: \mathrm{EXAM_i} \thicksim &\ \mathrm{Beta}(\mu_i, \phi) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i + \alpha_{Project[i]}\\ \alpha \thicksim &\ \mathrm{Normal(-1, 1)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.15)} \\ \alpha_p \thicksim &\ \mathrm{Weibull}(2, 1)\ \ \ \mathrm{for}\ p = 1..32\\ \mathrm{log}(\phi) \thicksim &\ \mathrm{Normal(50, 20)} \end{split} \]
mexam3p = brm(
formula = EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
data = d,
family = Beta(),
prior = c(
prior(normal(-1, 1), class=Intercept),
prior(normal(0, 0.15), class=b),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mexam3p, nsamples = 100) + scale_y_continuous(trans='sqrt')
mexam3= brm(
formula = EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
data = d,
family = Beta(),
prior = c(
prior(normal(-1, 1), class=Intercept),
prior(normal(0, 0.15), class=b),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
Results:
summary(mexam3)
## Family: beta
## Links: mu = logit; phi = identity
## Formula: EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.23 0.03 0.18 0.31 1.00 3017 4487
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -1.33 0.04 -1.42 -1.24 1.00 2123 4311
## AlgorithmBugspots 0.09 0.02 0.05 0.14 1.00 17942 12321
## LOC -0.07 0.03 -0.12 -0.01 1.00 7712 10194
## FixCount 0.01 0.02 -0.02 0.04 1.00 13109 12071
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 86.15 5.52 75.73 97.38 1.00 15292 12372
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The posterior predictive check shows a good overall fit.
pp_check(mexam3, nsamples = 100) + scale_y_continuous(trans='sqrt')
On the logit scale the effect of Bugspots is well positive with no 0 overlap.
On the outcome scale, there is some overlap between the two algorithms, but the
respective means are outside of the 95% intervals shown and there is a clear
tendency for Linespots to have lower EXAM values than Bugspots
mcmc_areas(mexam3, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("EXAM Posterior Distribution\n With 95% intervals")
eff = conditional_effects(mexam3, effects = c("Algorithm"))
eff$Algorithm
eff
And Diagnostics: All diagnostics look good.
min(neff_ratio(mexam3))
## [1] 0.1323598
max(rhat(mexam3))
## [1] 1.001445
plot(mexam3)
mexam1p = brm(
formula = EXAM ~ 1 + Algorithm + LOC,
data = d,
family = Beta(),
prior = c(
prior(normal(-1, 1), class=Intercept),
prior(normal(0, 0.15), class=b),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mexam1p, nsamples = 100) + scale_y_continuous(trans='sqrt')
mexam1 = brm(
formula = EXAM ~ 1 + Algorithm + LOC,
data = d,
family = Beta(),
prior = c(
prior(normal(-1, 1), class=Intercept),
prior(normal(0, 0.15), class=b),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mexam1, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(mexam1)
## Family: beta
## Links: mu = logit; phi = identity
## Formula: EXAM ~ 1 + Algorithm + LOC
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -1.30 0.02 -1.34 -1.26 1.00 14865 10711
## AlgorithmBugspots 0.09 0.03 0.03 0.15 1.00 14623 10199
## LOC -0.03 0.02 -0.06 0.00 1.00 14838 11222
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 51.09 3.23 44.97 57.64 1.00 14239 11189
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mexam2p = brm(
formula = EXAM ~ 1 + Algorithm + LOC + FixCount,
data = d,
family = Beta(),
prior = c(
prior(normal(-1, 1), class=Intercept),
prior(normal(0, 0.15), class=b),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mexam2p, nsamples = 100) + scale_y_continuous(trans='sqrt')
mexam2= brm(
formula = EXAM ~ 1 + Algorithm + LOC + FixCount,
data = d,
family = Beta(),
prior = c(
prior(normal(-1, 1), class=Intercept),
prior(normal(0, 0.15), class=b),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mexam2, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(mexam2)
## Family: beta
## Links: mu = logit; phi = identity
## Formula: EXAM ~ 1 + Algorithm + LOC + FixCount
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -1.30 0.02 -1.34 -1.26 1.00 15010 10648
## AlgorithmBugspots 0.09 0.03 0.03 0.15 1.00 16359 11268
## LOC -0.03 0.02 -0.06 0.00 1.00 17271 11433
## FixCount -0.01 0.01 -0.04 0.02 1.00 17500 11478
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 51.01 3.28 44.76 57.69 1.00 17602 10996
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mexam4p = brm(
formula = EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
data = d,
family = Beta(),
prior = c(
prior(normal(-1, 1), class=Intercept),
prior(normal(0, 0.15), class=b),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mexam4p, nsamples = 100) + scale_y_continuous(trans='sqrt')
mexam4= brm(
formula = EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
data = d,
family = Beta(),
prior = c(
prior(normal(-1, 1), class=Intercept),
prior(normal(0, 0.15), class=b),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mexam4, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(mexam4)
## Family: beta
## Links: mu = logit; phi = identity
## Formula: EXAM ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~language (Number of levels: 8)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.23 0.11 0.08 0.50 1.00 4199 5616
##
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.19 0.03 0.13 0.26 1.00 4741 8248
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -1.29 0.10 -1.48 -1.08 1.00 6814 7048
## AlgorithmBugspots 0.09 0.02 0.05 0.14 1.00 25166 11031
## LOC -0.07 0.03 -0.12 -0.02 1.00 14248 12899
## FixCount 0.00 0.02 -0.03 0.03 1.00 20418 12551
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 86.23 5.51 75.75 97.33 1.00 21395 11079
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.
loo_mauroc1 = loo(mexam1)
loo_mexam2 = loo(mexam2)
loo_mexam3 = loo(mexam3)
loo_mexam4 = loo(mexam4)
loo_compare(loo_mauroc1, loo_mauroc2, loo_mauroc3, loo_mauroc4)
## elpd_diff se_diff
## mauroc4 0.0 0.0
## mauroc3 -0.4 1.1
## mauroc2 -65.6 11.3
## mexam1 -1103.3 51.3
For the EInspectF we use a negative binomial likelihood, as it is a count outcome We also considered a poisson, but the difference in mean and variance makes the negative binomial the better candidate.
We do not have past experience for this value but based on the results of zouEmpiricalStudyFault2019 we use a prior mean of 500 LOC or roughly 6 on the log scale.
\[ \begin{split} \mathcal{M}_3: \mathrm{EInspectF_i} \thicksim &\ \mathrm{NegBinom}(\mu_i, \phi) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i + \alpha_{Project[i]} \\ \alpha \thicksim &\ \mathrm{Normal(-3, 0.5)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.3)} \\ \alpha_p \thicksim &\ \mathrm{Weibull}(2, 1)\ \ \ \mathrm{for}\ p = 1..32\\ \phi \thicksim &\ \mathrm{Weibull}(2, 1) \\ \end{split} \]
mEInspectF3p = brm(
formula = EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
data = d,
family = negbinomial(),
prior = c(
prior(normal(6, 1), class=Intercept),
prior(normal(0, 1), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspectF3p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspectF3 = brm(
formula = EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
data = d,
family = negbinomial(),
prior = c(
prior(normal(6, 1), class=Intercept),
prior(normal(0, 1), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
Results:
summary(mEInspectF3)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.40 0.19 1.08 1.80 1.00 3210 5397
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 5.70 0.25 5.20 6.19 1.00 2012 3290
## AlgorithmBugspots 1.45 0.13 1.19 1.71 1.00 14879 11538
## LOC 0.02 0.15 -0.27 0.32 1.00 8735 10682
## FixCount -0.57 0.08 -0.72 -0.42 1.00 10648 10574
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 0.62 0.03 0.56 0.69 1.00 15590 11175
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The comparison shows very similar LOO performance for M3 and M4. We chose M3 as our final model as it shows very good sampling behavior and because it is the smaller model of the two.
The posterior predictive check shows a good overall fit.
pp_check(mEInspectF3, nsamples = 100) + scale_x_continuous(trans='log10')
The 95% interval of the effect of ‘Bugspots’ on the logit scale is completely negative, however there is some overlap with 0 further in the tail.
On the outcome scale, the overlap is big between the two algorithms. While the means and lower bounds are close, the upper bound for Linespots is higher than for Bugspots
mcmc_areas(mEInspectF3, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("EInspectF Posterior Distribution\n With 95% intervals")
eff = conditional_effects(mEInspectF3, effects = c("Algorithm"))
eff$Algorithm
eff
And Diagnostics: All diagnostics look good.
min(neff_ratio(mEInspectF3))
## [1] 0.127741
max(rhat(mEInspectF3))
## [1] 1.002961
plot(mEInspectF3)
mEInspectF1p = brm(
formula = EInspectF ~ 1 + Algorithm + LOC,
data = d,
family = negbinomial(),
prior = c(
prior(normal(6, 1), class=Intercept),
prior(normal(0, 1), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspectF1p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspectF1 = brm(
formula = EInspectF ~ 1 + Algorithm + LOC,
data = d,
family = negbinomial(),
prior = c(
prior(normal(6, 1), class=Intercept),
prior(normal(0, 1), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspectF1, nsamples = 100) + scale_x_continuous(trans='log10')
summary(mEInspectF1)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspectF ~ 1 + Algorithm + LOC
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 6.91 0.11 6.69 7.14 1.00 14364 10960
## AlgorithmBugspots 1.44 0.16 1.13 1.75 1.00 13600 11402
## LOC 1.10 0.15 0.82 1.39 1.00 13918 10434
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 0.34 0.02 0.30 0.37 1.00 13645 10606
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mEInspectF2p = brm(
formula = EInspectF ~ 1 + Algorithm + LOC + FixCount,
data = d,
family = negbinomial(),
prior = c(
prior(normal(6, 1), class=Intercept),
prior(normal(0, 1), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspectF2p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspectF2 = brm(
formula = EInspectF ~ 1 + Algorithm + LOC + FixCount,
data = d,
family = negbinomial(),
prior = c(
prior(normal(6, 1), class=Intercept),
prior(normal(0, 1), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspectF2, nsamples = 100) + scale_x_continuous(trans='log10')
summary(mEInspectF2)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspectF ~ 1 + Algorithm + LOC + FixCount
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 6.45 0.10 6.26 6.66 1.00 17040 11248
## AlgorithmBugspots 1.59 0.15 1.30 1.87 1.00 15349 10498
## LOC 1.32 0.14 1.05 1.60 1.00 16656 9711
## FixCount -0.95 0.06 -1.07 -0.83 1.00 16382 11563
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 0.40 0.02 0.36 0.45 1.00 16242 11779
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mEInspectF4p = brm(
formula = EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
data = d,
family = negbinomial(),
prior = c(
prior(normal(6, 1), class=Intercept),
prior(normal(0, 1), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspectF4p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspectF4= brm(
formula = EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
data = d,
family = negbinomial(),
prior = c(
prior(normal(6, 1), class=Intercept),
prior(normal(0, 1), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspectF4, nsamples = 100) + scale_x_continuous(trans='log10')
summary(mEInspectF4)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspectF ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~language (Number of levels: 8)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.00 0.32 0.42 1.71 1.00 6544 6112
##
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.13 0.18 0.83 1.53 1.00 5662 9674
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 5.73 0.40 4.91 6.51 1.00 9649 10245
## AlgorithmBugspots 1.46 0.13 1.20 1.72 1.00 22342 11553
## LOC -0.04 0.15 -0.32 0.25 1.00 17370 11623
## FixCount -0.55 0.07 -0.70 -0.41 1.00 19721 12519
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 0.62 0.04 0.56 0.70 1.00 20939 10976
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.
loo_mEInspectF1 = loo(mEInspectF1)
loo_mEInspectF2 = loo(mEInspectF2)
loo_mEInspectF3 = loo(mEInspectF3)
loo_mEInspectF4 = loo(mEInspectF4)
loo_compare(loo_mEInspectF1, loo_mEInspectF2, loo_mEInspectF3, loo_mEInspectF4)
## elpd_diff se_diff
## mEInspectF4 0.0 0.0
## mEInspectF3 -1.0 1.1
## mEInspectF2 -140.0 16.0
## mEInspectF1 -208.1 19.9
For the EInspect10 we use a negative binomial likelihood, as it is a count outcome. We also considered a poisson, but the difference in mean and variance makes the negative binomial the better candidate.
We do not have experience with both algorithms for this metric, but zouEmpiricalStudyFault2019 gives an EInspect value of 0 for Bugspots. Based on that we set a low prior of 0.01 or roughly -4 on the logit scale.
\[ \begin{split} \mathcal{M}_3: \mathrm{EInspect10_i} \thicksim &\ \mathrm{NegBinom}(\mu_i, \phi) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i + \alpha_{Project[i]} \\ \alpha \thicksim &\ \mathrm{Normal(-3, 0.5)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.3)} \\ \alpha_p \thicksim &\ \mathrm{Weibull}(2, 1)\ \ \ \mathrm{for}\ p = 1..32\\ \phi \thicksim &\ \mathrm{Weibull}(2, 1) \\ \end{split} \]
mEInspect103p = brm(
formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
data = d,
family = negbinomial(),
prior = c(
prior(normal(-4, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect103p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspect103 = brm(
formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
data = d,
family = negbinomial(),
prior = c(
prior(normal(-4, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
Results:
summary(mEInspect103)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.33 0.32 0.75 2.02 1.00 5254 8321
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -3.04 0.35 -3.77 -2.40 1.00 7231 10822
## AlgorithmBugspots -0.40 0.20 -0.79 -0.01 1.00 26087 11934
## LOC -0.16 0.20 -0.56 0.24 1.00 15861 12654
## FixCount 0.13 0.14 -0.14 0.39 1.00 15648 12451
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.31 0.42 0.61 2.22 1.00 22923 11028
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The comparison shows equal LOO performance for M3 and M4. We chose M3 as our candidate model, as it is the smaller one.
The posterior predictive check shows that the model fits the data well with only a slightly longer tail.
pp_check(mEInspect103, nsamples = 100) + scale_x_continuous(trans='log10')
The 95% interval of the effect of ‘Bugspots’ on the logit scale is completely negative, however there is overlap with 0 in the tail.
On the outcome scale, the overlap is big between the two algorithms. While the means and lower bounds are close, the upper bound for Linespots is higher than for Bugspots
mcmc_areas(mEInspect103, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("EInspect10 Posterior Distribution\n With 95% intervals")
eff = conditional_effects(mEInspect103, effects = c("Algorithm"))
eff$Algorithm
eff
And Diagnostics: All diagnostics look good.
min(neff_ratio(mEInspect103))
## [1] 0.2338744
max(rhat(mEInspect103))
## [1] 1.000227
plot(mEInspect103)
mEInspect101p = brm(
formula = EInspect10 ~ 1 + Algorithm + LOC,
data = d,
family = negbinomial(),
prior = c(
prior(normal(-4, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect101p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspect101 = brm(
formula = EInspect10 ~ 1 + Algorithm + LOC,
data = d,
family = negbinomial(),
prior = c(
prior(normal(-4, 0.5), class=Intercept),
prior(normal(0, 0.5), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect101, nsamples = 100) + scale_x_continuous(trans='log10')
summary(mEInspect101)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspect10 ~ 1 + Algorithm + LOC
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -2.17 0.19 -2.55 -1.81 1.00 13976 10777
## AlgorithmBugspots -0.82 0.28 -1.38 -0.28 1.00 12353 10943
## LOC -0.49 0.26 -1.04 -0.03 1.00 11104 8599
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.07 0.41 0.43 2.01 1.00 11858 9940
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mEInspect102p = brm(
formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount,
data = d,
family = negbinomial(),
prior = c(
prior(normal(-4, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect102p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspect102 = brm(
formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount,
data = d,
family = negbinomial(),
prior = c(
prior(normal(-4, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect102, nsamples = 100) + scale_x_continuous(trans='log10')
summary(mEInspect102)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspect10 ~ 1 + Algorithm + LOC + FixCount
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -2.31 0.18 -2.66 -1.97 1.00 17396 11372
## AlgorithmBugspots -0.42 0.20 -0.81 -0.03 1.00 16910 11519
## LOC -0.29 0.17 -0.65 0.02 1.00 15694 10689
## FixCount 0.22 0.12 -0.02 0.44 1.00 15929 10689
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.07 0.41 0.43 2.00 1.00 15902 10390
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mEInspect104p = brm(
formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
data = d,
family = negbinomial(),
prior = c(
prior(normal(-4, 0.5), class=Intercept),
prior(normal(0, 0.3), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect104p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspect104= brm(
formula = EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
data = d,
family = negbinomial(),
prior = c(
prior(normal(-4, 0.5), class=Intercept),
prior(normal(0, 0.3), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect104, nsamples = 100) + scale_x_continuous(trans='log10')
summary(mEInspect104)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspect10 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~language (Number of levels: 8)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.11 0.37 0.45 1.92 1.00 7021 6489
##
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.87 0.35 0.24 1.60 1.00 4181 5741
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -3.28 0.40 -4.10 -2.52 1.00 10753 12239
## AlgorithmBugspots -0.49 0.22 -0.92 -0.06 1.00 21691 11280
## LOC -0.11 0.23 -0.58 0.35 1.00 16340 12736
## FixCount 0.13 0.14 -0.15 0.39 1.00 18232 12427
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.31 0.42 0.60 2.24 1.00 24336 10413
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.
loo_mEInspect101 = loo(mEInspect101)
loo_mEInspect102 = loo(mEInspect102)
loo_mEInspect103 = loo(mEInspect103)
loo_mEInspect104 = loo(mEInspect104)
loo_compare(loo_mEInspect101, loo_mEInspect102, loo_mEInspect103, loo_mEInspect104)
## elpd_diff se_diff
## mEInspect104 0.0 0.0
## mEInspect103 -0.8 1.5
## mEInspect101 -13.8 4.3
## mEInspect102 -14.0 4.0
For the EInspect100 we use a negative binomial likelihood, as it is a count outcome. We also considered a poisson, but the difference in mean and variance makes the negative binomial the better candidate.
We do not have past experience with this metric so we will use the prior for the EInspect10 models and multiply it by 5. We do not multiply by 10 as we expect more faults but based on the kind of ranking, we do not expect the growth to be linear. An intercept mean of 0.05 translates to -3 on the logit scale.
\[ \begin{split} \mathcal{M}_3: \mathrm{EInspect100_i} \thicksim &\ \mathrm{NegBinom}(\mu_i, \phi) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i + \alpha_{Project[i]} \\ \alpha \thicksim &\ \mathrm{Normal(-3, 0.5)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.3)} \\ \alpha_p \thicksim &\ \mathrm{Weibull}(2, 1)\ \ \ \mathrm{for}\ p = 1..32\\ \phi \thicksim &\ \mathrm{Weibull}(2, 1) \\ \end{split} \]
mEInspect1003p = brm(
formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
data = d,
family = negbinomial(),
prior = c(
prior(normal(-3, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect1003p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspect1003 = brm(
formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project),
data = d,
family = negbinomial(),
prior = c(
prior(normal(-3, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
Results:
summary(mEInspect1003)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.18 0.24 0.78 1.71 1.00 3537 6382
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -1.11 0.27 -1.68 -0.64 1.00 3411 5643
## AlgorithmBugspots -0.47 0.13 -0.73 -0.21 1.00 17516 11161
## LOC -0.23 0.16 -0.55 0.08 1.00 10244 11322
## FixCount 0.23 0.08 0.08 0.40 1.00 12401 12197
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.37 0.28 0.91 1.98 1.00 16274 11240
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The comparison shows very similar LOO performance for M3 and M4. We chose M3 as our final model as it shows very good sampling behavior and because it is the smaller model of the two.
The posterior predictive check shows a good overall fit.
pp_check(mEInspect1003, nsamples = 100) + scale_x_continuous(trans='log10')
The 95% interval of the effect of ‘Bugspots’ on the logit scale is completely negative, however there is some overlap with 0 further in the tail.
On the outcome scale, the overlap is big between the two algorithms. While the means and lower bounds are close, the upper bound for Linespots is higher than for Bugspots
mcmc_areas(mEInspect1003, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("EInspect100 Posterior Distribution\n With 95% intervals")
eff = conditional_effects(mEInspect1003, effects = c("Algorithm"))
eff$Algorithm
eff
And Diagnostics: All diagnostics look good.
min(neff_ratio(mEInspect1003))
## [1] 0.1807117
max(rhat(mEInspect1003))
## [1] 1.001631
plot(mEInspect1003)
mEInspect1001p = brm(
formula = EInspect100 ~ 1 + Algorithm + LOC,
data = d,
family = negbinomial(),
prior = c(
prior(normal(-3, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect1001p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspect1001 = brm(
formula = EInspect100 ~ 1 + Algorithm + LOC,
data = d,
family = negbinomial(),
prior = c(
prior(normal(-3, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect1001, nsamples = 100) + scale_x_continuous(trans='log10')
summary(mEInspect1001)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspect100 ~ 1 + Algorithm + LOC
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.39 0.11 -0.60 -0.17 1.00 13003 11232
## AlgorithmBugspots -0.35 0.14 -0.62 -0.07 1.00 13559 10168
## LOC -0.43 0.12 -0.69 -0.20 1.00 14109 10594
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 0.56 0.09 0.40 0.76 1.00 13854 10803
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mEInspect1002p = brm(
formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount,
data = d,
family = negbinomial(),
prior = c(
prior(normal(-3, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect1002p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspect1002 = brm(
formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount,
data = d,
family = negbinomial(),
prior = c(
prior(normal(-3, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect1002, nsamples = 100) + scale_x_continuous(trans='log10')
summary(mEInspect1002)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspect100 ~ 1 + Algorithm + LOC + FixCount
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.49 0.11 -0.70 -0.29 1.00 16002 10707
## AlgorithmBugspots -0.35 0.14 -0.62 -0.07 1.00 16079 10467
## LOC -0.49 0.13 -0.74 -0.25 1.00 14761 11725
## FixCount 0.43 0.07 0.28 0.57 1.00 15608 10919
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 0.72 0.13 0.50 1.01 1.00 14112 10454
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mEInspect1004p = brm(
formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
data = d,
family = negbinomial(),
prior = c(
prior(normal(-3, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect1004p, nsamples = 100) + scale_x_continuous(trans='log10')
mEInspect1004= brm(
formula = EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language),
data = d,
family = negbinomial(),
prior = c(
prior(normal(-3, 0.5), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(weibull(2, 1), class=sd),
prior(weibull(2, 1), class=shape)
),
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(mEInspect1004, nsamples = 100) + scale_x_continuous(trans='log10')
summary(mEInspect1004)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: EInspect100 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~language (Number of levels: 8)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.04 0.36 0.42 1.83 1.00 5569 5802
##
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.87 0.23 0.49 1.39 1.00 4083 7817
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -1.59 0.42 -2.48 -0.84 1.00 7030 9277
## AlgorithmBugspots -0.47 0.13 -0.73 -0.21 1.00 25809 11554
## LOC -0.10 0.16 -0.43 0.21 1.00 16537 12260
## FixCount 0.24 0.08 0.08 0.39 1.00 18753 12152
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.36 0.27 0.91 1.97 1.00 23146 12396
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.
loo_mEInspect1001 = loo(mEInspect1001)
loo_mEInspect1002 = loo(mEInspect1002)
loo_mEInspect1003 = loo(mEInspect1003)
loo_mEInspect1004 = loo(mEInspect1004)
loo_compare(loo_mEInspect1001, loo_mEInspect1002, loo_mEInspect1003, loo_mEInspect1004)
## elpd_diff se_diff
## mEInspect1003 0.0 0.0
## mEInspect1004 -1.2 1.2
## mEInspect1002 -43.8 8.3
## mEInspect1001 -60.5 10.1
For the area under the cost-effectiveness curve at 0.01 LOC we use a zero-inflated beta likelihood. The AUCEC values are all normalized to their respective optimal value which puts them between 0 and 1. However, the low cut-off leads to some 0 results, which we need the zero-inflation for.
We do not have past experience with this metric but based on arisholmSystematicComprehensiveInvestigation2010 we use an intercept prior of 0.1 or roughly -2 on the logit scale.
As we presume most of the values to be small and somewhat concentrated, we expect phi to be a little higher. We put a wide prior on phi as we are not certain where exactly it will lie.
\[ \begin{split} \mathcal{M}_4: \mathrm{AUCEC1_i} \thicksim &\ \mathrm{ZIBeta}(\mu_i, \phi, \lambda_i) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i \\ \mathrm{logit}(\lambda)_i = &\ \beta_{ziA} \mathrm{Algorithm}_i + \alpha_{ziProject[i]} \\ \alpha \thicksim &\ \mathrm{Normal(0, 1)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.25)} \\ \beta_{ziA} \thicksim &\ Normal(0, 2)\\ \alpha_{zip} \thicksim &\ \mathrm{Logistic}(0, 1)\ \ \ \mathrm{for}\ zip = 1..32 \\ \mathrm{log}(\phi) \thicksim &\ \mathrm{Normal(50, 20)} \\ \end{split} \]
maucec12p = brm(
bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount, zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec12p, nsamples = 100) + scale_y_continuous(trans='sqrt')
maucec12 = brm(
bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
Results:
summary(maucec12)
## Family: zero_inflated_beta
## Links: mu = logit; phi = identity; zi = logit
## Formula: AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project)
## zi ~ Algorithm + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.41 0.07 0.30 0.56 1.00 4212 7350
## sd(zi_Intercept) 2.28 0.62 1.32 3.74 1.00 5235 8181
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept -2.17 0.08 -2.33 -2.01 1.00 3493
## zi_Intercept -5.39 0.80 -7.14 -4.00 1.00 8024
## AlgorithmBugspots -0.49 0.05 -0.59 -0.39 1.00 23185
## LOC 0.25 0.05 0.15 0.34 1.00 9471
## FixCount -0.07 0.03 -0.13 -0.01 1.00 17302
## zi_AlgorithmBugspots 2.05 0.52 1.10 3.15 1.00 22102
## Tail_ESS
## Intercept 6434
## zi_Intercept 8985
## AlgorithmBugspots 12418
## LOC 10836
## FixCount 13004
## zi_AlgorithmBugspots 11493
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 39.24 2.75 33.98 44.84 1.00 18771 11813
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The posterior predictive check shows a good fit.
pp_check(maucec12, nsamples = 100) + scale_y_continuous(trans='sqrt')
The effect of Bugspots on the logit scale is entirely negative with no 0 overlap.
On the outcome scale, there is no overlap between the two algorithms with Linespots consistently having higher values than Bugspots
mcmc_areas(maucec12, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("AUCEC1 Posterior Distribution\n With 95% intervals")
eff = conditional_effects(maucec12, effects = c("Algorithm"))
eff$Algorithm
eff
And Diagnostics: All diagnostics look good.
min(neff_ratio(maucec12))
## [1] 0.2046774
max(rhat(maucec12))
## [1] 1.001567
plot(maucec12)
maucec11p = brm(
bf(AUCEC1 ~ 1 + Algorithm + LOC, zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec11p, nsamples = 100) + scale_y_continuous(trans='sqrt')
maucec11 = brm(
bf(AUCEC1 ~ 1 + Algorithm + LOC, zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec11, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(maucec11)
## Family: zero_inflated_beta
## Links: mu = logit; phi = identity; zi = logit
## Formula: AUCEC1 ~ 1 + Algorithm + LOC
## zi ~ Algorithm + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(zi_Intercept) 2.29 0.63 1.33 3.78 1.00 4763 7661
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept -2.13 0.04 -2.21 -2.06 1.00 21858
## zi_Intercept -5.39 0.80 -7.17 -3.99 1.00 6962
## AlgorithmBugspots -0.45 0.06 -0.57 -0.34 1.00 18373
## LOC 0.10 0.03 0.05 0.15 1.00 19145
## zi_AlgorithmBugspots 2.06 0.53 1.09 3.16 1.00 18603
## Tail_ESS
## Intercept 12111
## zi_Intercept 7907
## AlgorithmBugspots 11573
## LOC 10963
## zi_AlgorithmBugspots 11734
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 26.67 1.85 23.16 30.42 1.00 15799 11385
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
maucec13p = brm(
bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec13p, nsamples = 100) + scale_y_continuous(trans='sqrt')
maucec13 = brm(
bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec13, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(maucec13)
## Family: zero_inflated_beta
## Links: mu = logit; phi = identity; zi = logit
## Formula: AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project)
## zi ~ Algorithm + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.42 0.07 0.31 0.58 1.00 3692 5914
## sd(zi_Intercept) 2.28 0.61 1.32 3.74 1.00 5230 7440
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept -2.18 0.08 -2.34 -2.02 1.00 3382
## zi_Intercept -5.38 0.79 -7.07 -3.99 1.00 7531
## AlgorithmBugspots -0.49 0.05 -0.59 -0.39 1.00 19323
## LOC 0.25 0.05 0.15 0.35 1.00 7594
## FixCount -0.07 0.03 -0.13 -0.01 1.00 13525
## zi_AlgorithmBugspots 2.06 0.53 1.08 3.14 1.00 17396
## Tail_ESS
## Intercept 6064
## zi_Intercept 8299
## AlgorithmBugspots 11761
## LOC 9947
## FixCount 11882
## zi_AlgorithmBugspots 10806
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 39.30 2.72 34.13 44.78 1.00 16522 11712
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
maucec14p = brm(
bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language), zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec14p, nsamples = 100) + scale_y_continuous(trans='sqrt')
maucec14 = brm(
bf(AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language), zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec14, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(maucec14)
## Family: zero_inflated_beta
## Links: mu = logit; phi = identity; zi = logit
## Formula: AUCEC1 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language)
## zi ~ Algorithm + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~language (Number of levels: 8)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.37 0.17 0.11 0.79 1.00 4355 4967
##
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.36 0.07 0.25 0.51 1.00 5259 8958
## sd(zi_Intercept) 2.27 0.63 1.30 3.77 1.00 5397 8797
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept -2.20 0.16 -2.53 -1.87 1.00 6942
## zi_Intercept -5.37 0.79 -7.08 -3.97 1.00 8757
## AlgorithmBugspots -0.49 0.05 -0.59 -0.40 1.00 29558
## LOC 0.28 0.05 0.18 0.37 1.00 14255
## FixCount -0.07 0.03 -0.14 -0.01 1.00 22614
## zi_AlgorithmBugspots 2.05 0.53 1.08 3.15 1.00 25097
## Tail_ESS
## Intercept 7564
## zi_Intercept 9938
## AlgorithmBugspots 11591
## LOC 11953
## FixCount 12556
## zi_AlgorithmBugspots 11732
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 39.48 2.75 34.27 45.08 1.00 25254 11685
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.
loo_maucec11 = loo(maucec11)
loo_maucec12 = loo(maucec12)
loo_maucec13 = loo(maucec13)
loo_maucec14 = loo(maucec14)
loo_compare(loo_maucec11, loo_maucec12, loo_maucec13, loo_maucec14)
## elpd_diff se_diff
## maucec14 0.0 0.0
## maucec13 -1.1 1.2
## maucec12 -1.1 1.2
## maucec11 -69.3 10.5
For the area under the cost-effectiveness curve at 0.05 LOC we use a zero-inflated beta likelihood. The AUCEC values are all normalized to their respective optimal value which puts them between 0 and 1. However, the low cut-off leads to some 0 results, which we need the zero-inflation for.
We do not have past experience with this metric but based on arisholmSystematicComprehensiveInvestigation2010 we use an intercept prior of 0.1 or roughly -2 on the logit scale.
As we presume most of the values to be small and somewhat concentrated, we expect phi to be a little higher. We put a wide prior on phi as we are not certain where exactly it will lie.
\[ \begin{split} \mathcal{M}_2: \mathrm{AUCEC1_i} \thicksim &\ \mathrm{ZIBeta}(\mu_i, \phi, \lambda_i) \\ \mathrm{logit}(\mu_i) = &\ \alpha + \beta_{A} \mathrm{Algorithm}_i + \beta_{L} \mathrm{LOC}_i + \beta_F \mathrm{FixCount}_i\\ \mathrm{logit}(\lambda)_i = &\ \beta_{ziA} \mathrm{Algorithm}_i + \alpha_{ziProject[i]} \\ \alpha \thicksim &\ \mathrm{Normal(0, 1)} \\ \beta_{A}, \beta_{L}, \beta_F \thicksim &\ \mathrm{Normal(0, 0.25)} \\ \beta_{ziA} \thicksim &\ Normal(0, 2)\\ \alpha_{zip} \thicksim &\ \mathrm{Logistic}(0, 1)\ \ \ \mathrm{for}\ zip = 1..32 \\ \mathrm{log}(\phi) \thicksim &\ \mathrm{Normal}(50, 20) \\ \end{split} \]
maucec52p = brm(
bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount, zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec52p, nsamples = 100) + scale_y_continuous(trans='sqrt')
maucec52 = brm(
bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
Results:
summary(maucec52)
## Family: zero_inflated_beta
## Links: mu = logit; phi = identity; zi = logit
## Formula: AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project)
## zi ~ Algorithm + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.44 0.07 0.33 0.59 1.00 3554 7184
## sd(zi_Intercept) 1.75 0.85 0.28 3.66 1.00 5717 3740
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept -1.21 0.08 -1.37 -1.04 1.00 2347
## zi_Intercept -7.33 1.45 -10.52 -4.94 1.00 11425
## AlgorithmBugspots -0.45 0.04 -0.53 -0.37 1.00 27401
## LOC 0.21 0.04 0.12 0.29 1.00 9703
## FixCount -0.02 0.03 -0.07 0.04 1.00 17683
## zi_AlgorithmBugspots 1.84 1.09 -0.12 4.20 1.00 24559
## Tail_ESS
## Intercept 4270
## zi_Intercept 10342
## AlgorithmBugspots 11994
## LOC 12021
## FixCount 12702
## zi_AlgorithmBugspots 10632
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 29.58 1.95 25.93 33.51 1.00 23528 11656
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The posterior predictive check shows a good fit.
pp_check(maucec52, nsamples = 100) + scale_y_continuous(trans='sqrt')
The effect of Bugspots on the logit scale is entirely negative with no 0 overlap.
On the outcome scale, there is no overlap between the two algorithms with Linespots consistently having higher values than Bugspots
mcmc_areas(maucec52, pars = c("b_AlgorithmBugspots"), prob = 0.95) + ggtitle("AUCEC5 Posterior Distribution\n With 95% intervals")
eff = conditional_effects(maucec52, effects = c("Algorithm"))
eff$Algorithm
eff
And Diagnostics: All diagnostics look good.
min(neff_ratio(maucec52))
## [1] 0.1473713
max(rhat(maucec52))
## [1] 1.000866
plot(maucec52)
maucec51p = brm(
bf(AUCEC5 ~ 1 + Algorithm + LOC, zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec51p, nsamples = 100) + scale_y_continuous(trans='sqrt')
maucec51 = brm(
bf(AUCEC5 ~ 1 + Algorithm + LOC, zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec51, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(maucec51)
## Family: zero_inflated_beta
## Links: mu = logit; phi = identity; zi = logit
## Formula: AUCEC5 ~ 1 + Algorithm + LOC
## zi ~ Algorithm + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(zi_Intercept) 1.75 0.86 0.25 3.70 1.00 5940 3996
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept -1.18 0.04 -1.25 -1.11 1.00 27599
## zi_Intercept -7.32 1.45 -10.54 -4.93 1.00 12209
## AlgorithmBugspots -0.42 0.05 -0.52 -0.32 1.00 26109
## LOC 0.07 0.03 0.02 0.12 1.00 33709
## zi_AlgorithmBugspots 1.84 1.09 -0.10 4.18 1.00 24736
## Tail_ESS
## Intercept 12331
## zi_Intercept 9832
## AlgorithmBugspots 11690
## LOC 11132
## zi_AlgorithmBugspots 11076
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 16.33 1.04 14.36 18.41 1.00 28359 12427
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
maucec53p = brm(
bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec53p, nsamples = 100) + scale_y_continuous(trans='sqrt')
maucec53 = brm(
bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project), zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec53, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(maucec53)
## Family: zero_inflated_beta
## Links: mu = logit; phi = identity; zi = logit
## Formula: AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project)
## zi ~ Algorithm + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.45 0.07 0.34 0.60 1.00 3330 6214
## sd(zi_Intercept) 1.75 0.85 0.29 3.70 1.00 5430 3939
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept -1.21 0.09 -1.37 -1.04 1.00 2021
## zi_Intercept -7.32 1.45 -10.58 -4.93 1.00 9903
## AlgorithmBugspots -0.46 0.04 -0.54 -0.37 1.00 22895
## LOC 0.21 0.04 0.12 0.29 1.00 8400
## FixCount -0.02 0.03 -0.07 0.03 1.00 15596
## zi_AlgorithmBugspots 1.83 1.08 -0.11 4.15 1.00 19862
## Tail_ESS
## Intercept 3568
## zi_Intercept 9832
## AlgorithmBugspots 12021
## LOC 10652
## FixCount 12670
## zi_AlgorithmBugspots 11114
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 29.60 1.97 25.89 33.53 1.00 21058 11452
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
maucec54p = brm(
bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language), zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = "only",
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec54p, nsamples = 100) + scale_y_continuous(trans='sqrt')
maucec54 = brm(
bf(AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language), zi ~ Algorithm + (1 | Project)),
data = d,
family = zero_inflated_beta(),
prior = c(
prior(normal(-2, 1), class=Intercept),
prior(normal(0, 0.25), class=b),
prior(normal(0, 2), class=b, dpar="zi"),
prior(weibull(2, 1), class=sd),
prior(normal(50, 20), class=phi)
),
init = 0,
iter = SAMPLES,
warmup = WARMUP,
chains = CHAINS,
cores = parallel::detectCores(),
sample_prior = FALSE,
control = list(adapt_delta = DELTA, max_treedepth = TREE),
seed = SEED
)
pp_check(maucec54, nsamples = 100) + scale_y_continuous(trans='sqrt')
summary(maucec54)
## Family: zero_inflated_beta
## Links: mu = logit; phi = identity; zi = logit
## Formula: AUCEC5 ~ 1 + Algorithm + LOC + FixCount + (1 | Project) + (1 | language)
## zi ~ Algorithm + (1 | Project)
## Data: d (Number of observations: 480)
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup samples = 16000
##
## Group-Level Effects:
## ~language (Number of levels: 8)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.40 0.17 0.14 0.81 1.00 4176 4326
##
## ~Project (Number of levels: 32)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.37 0.07 0.26 0.51 1.00 4335 8732
## sd(zi_Intercept) 1.75 0.86 0.25 3.71 1.00 5047 3617
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept -1.26 0.17 -1.62 -0.95 1.00 5319
## zi_Intercept -7.32 1.47 -10.63 -4.92 1.00 9390
## AlgorithmBugspots -0.46 0.04 -0.54 -0.38 1.00 27050
## LOC 0.22 0.04 0.13 0.30 1.00 14566
## FixCount -0.02 0.03 -0.07 0.04 1.00 19173
## zi_AlgorithmBugspots 1.83 1.10 -0.11 4.20 1.00 20555
## Tail_ESS
## Intercept 6750
## zi_Intercept 9574
## AlgorithmBugspots 12360
## LOC 12457
## FixCount 12357
## zi_AlgorithmBugspots 11032
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## phi 29.65 1.99 25.88 33.64 1.00 20686 11503
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Based on using a 95% z score, M3 and M4 perform equally well. As M3 is the simpler model, we use it as our candidate model.
loo_maucec51 = loo(maucec51)
loo_maucec52 = loo(maucec52)
loo_maucec53 = loo(maucec53)
loo_maucec54 = loo(maucec54)
loo_compare(loo_maucec51, loo_maucec52, loo_maucec53, loo_maucec54)
## elpd_diff se_diff
## maucec54 0.0 0.0
## maucec53 -0.1 0.8
## maucec52 -0.2 0.8
## maucec51 -122.9 17.6
Bürkner, Paul-Christian. 2020. Brms: Bayesian Regression Models Using ’Stan’. https://CRAN.R-project.org/package=brms.
Gabry, Jonah, and Tristan Mahr. 2020. Bayesplot: Plotting for Bayesian Models. https://CRAN.R-project.org/package=bayesplot.